~GitHub/inattention-populationsample/code/inattention-populationsample-data-prep.Rmd
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter. Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I. When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
Inattentive behavior is associated with academic problems. The present study investigates primary school teacher reports on nine items reflecting different aspects of inattention, with an aim to reveal patterns of behavior predicting high-school academic achievement. To that end, we used different types of pattern analysis and machine learning methods.
Inattention in a sample 2397 individuals were rated by their primary school teachers when they participated in the first wave of the Bergen Child Study (BCS) (7 - 9 years old), and their academic achievements were available from an official school register when attending high-school (16 - 19 years old). Inattention was assessed by the nine items rated at a categorical leve, and the academic achievement scores were divided into three parts including a similar number of participants.
Boys obtained higher inattention scores and lower academic scores than girls. Inattention problems related to sustained attention and distractibility turned out to have the highest predictive value of academic achievement level across all selected statistical analyses, and the full model showed that inattention explained about 10% of the variance in high school scores about 10 years later. A high odds-ration of being allocated to the lowest academic achievement category was shown by a multinominal regression analysis, while a pattern of problems related to sustained attention and distractibility was revealed by generating classification trees. By including recursive learning algorithms, the most successful classification was found between these inattention items and the highest level of achievement scores.
The present study showed the importance of a pattern of early problems related to sustained attention and distractibility in predicting future academic results. By including different statistical classification models we showed that this pattern was fairly consistent. Furthermore, calculation of classification errors gave information about the uncertainty when predicting the outcome for individual children. Further studies should include a wider range of variables.
Organization of the data and the analysis:
Libraries being used:
Input file:
Output files (data):
# fn <- "../data2/inattention_Arvid_new.sav"
fn <- "../Dropbox/Arvid_inatteion/data2/inattention_Arvid_new.sav"
# The original SPSS file as provided to AJL is
# 'inattention_Astri_94_96_new_grades_updated.sav'
# and being edited and reduced by AJL to 'inattention_Arvid_new.sav'
# Import data stored in the SPSS format
library(memisc)
# fn <- "../data/inattention_Arvid_new.sav"
fn <- "/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav"
data <- as.data.set(spss.system.file(fn))
# Make new data frame from the sample with the variables
# gender, grade, SNAP1, ..., SNAP9 (vars #1-11) and
# academic_achievement (var #52)
names(data)
[1] "gender" "grade" "snap1" "snap2"
[5] "snap3" "snap4" "snap5" "snap6"
[9] "snap7" "snap8" "snap9" "snap10"
[13] "snap11" "snap12" "snap13" "snap14"
[17] "snap15" "snap16" "snap17" "snap18"
[21] "y_4_asrs_1" "y_4_asrs_2" "y_4_asrs_3" "y_4_asrs_4"
[25] "y_4_asrs_5" "y_4_asrs_6" "y_4_asrs_7" "y_4_asrs_8"
[29] "y_4_asrs_9" "y_4_asrs_10" "y_4_asrs_11" "y_4_asrs_12"
[33] "y_4_asrs_13" "y_4_asrs_14" "y_4_asrs_15" "y_4_asrs_16"
[37] "y_4_asrs_17" "y_4_asrs_18" "y_4_mfq_1" "y_4_mfq_2"
[41] "y_4_mfq_3" "y_4_mfq_4" "y_4_mfq_5" "y_4_mfq_6"
[45] "y_4_mfq_7" "y_4_mfq_8" "y_4_mfq_9" "y_4_mfq_10"
[49] "y_4_mfq_11" "y_4_mfq_12" "y_4_mfq_13" "academic_achievement"
d <- data[, c(1:11, 52)]
dim(d)
[1] 10870 12
names(d)
[1] "gender" "grade" "snap1" "snap2"
[5] "snap3" "snap4" "snap5" "snap6"
[9] "snap7" "snap8" "snap9" "academic_achievement"
str(d)
Data set with 10870 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num NA NA NA NA NA NA NA NA NA NA ...
$ grade : Itvl. item + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ academic_achievement: Itvl. item num 2.86 NA 3 3.67 4.1 ...
summary(d)
gender grade snap1 snap2 snap3
Girl:5528 Min. : 2.00 Not true :2646 Not true :2698 Not true :2810
Boy :4978 1st Qu. : 2.00 Somewhat true : 350 Somewhat true : 294 Somewhat true : 225
* : 0 Median : 3.00 Certainly true: 61 Certainly true: 65 Certainly true: 23
NAs : 364 Mean : 2.84 * : 0 * : 0 * : 0
3rd Qu. : 3.50 NAs :7813 NAs :7813 NAs :7812
Max. : 4.00
Missings: 0.00
NAs :7719.00
snap4 snap5 snap6 snap7
Not true :2806 Not true :2783 Not true :2784 Not true :2927
Somewhat true : 229 Somewhat true : 225 Somewhat true : 223 Somewhat true : 96
Certainly true: 22 Certainly true: 49 Certainly true: 49 Certainly true: 18
* : 0 * : 0 * : 0 * : 0
NAs :7813 NAs :7813 NAs :7814 NAs :7829
snap8 snap9 academic_achievement
Not true :2260 Not true :2733 Min. : 1.000
Somewhat true : 669 Somewhat true : 288 1st Qu. : 3.286
Certainly true: 127 Certainly true: 37 Median : 3.889
* : 0 * : 0 Mean : 3.824
NAs :7814 NAs :7812 3rd Qu. : 4.444
Max. : 6.000
Missings: 0.000
NAs :2204.000
# Get observations of data frame that have missing values and those with complete cases
library(psych)
d.miss <- d[!complete.cases(d),]
d.nomiss <- d[complete.cases(d),]
str(d.nomiss)
Data set with 2397 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 0 ...
$ grade : Itvl. item + ms.v. num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 1 0 0 0 0 0 0 0 0 ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 1 0 ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ academic_achievement: Itvl. item num 4.67 3.67 4.14 4.11 4.3 ...
headTail(as.data.frame(d.nomiss))
summary(d.nomiss)
gender grade snap1 snap2 snap3
Girl:1256 Min. :2.000 Not true :2079 Not true :2117 Not true :2201
Boy :1141 1st Qu.:2.000 Somewhat true : 272 Somewhat true : 230 Somewhat true : 181
Median :3.000 Certainly true: 46 Certainly true: 50 Certainly true: 15
Mean :2.814
3rd Qu.:3.000
Max. :4.000
snap4 snap5 snap6 snap7
Not true :2217 Not true :2190 Not true :2195 Not true :2312
Somewhat true : 164 Somewhat true : 176 Somewhat true : 170 Somewhat true : 73
Certainly true: 16 Certainly true: 31 Certainly true: 32 Certainly true: 12
snap8 snap9 academic_achievement
Not true :1794 Not true :2142 Min. :1.000
Somewhat true : 510 Somewhat true : 228 1st Qu.:3.556
Certainly true: 93 Certainly true: 27 Median :4.083
Mean :4.023
3rd Qu.:4.556
Max. :5.900
D1 <- d.nomiss # For later use
summary(D1$snap1[D1$gender == "Boy"])
Not true Somewhat true Certainly true
935 176 30
summary(D1$snap1[D1$gender == "Girl"])
Not true Somewhat true Certainly true
1144 96 16
# Save the nomiss D to an .csv file without row names for further analysis
D <- d.nomiss
write.csv(D, file = "../data/inattention_nomiss_2397x12.csv",row.names=FALSE)
# For simplicity, we rename (and translate) the variables names in the dataset D without any missing
library(plyr)
d.nomiss <- read.csv(file = "../data/inattention_nomiss_2397x12.csv")
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
D3 <- D # For later use
# Save D (at early stage) to an .csv file for later analysis in R or MATLAB
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2.csv",row.names=FALSE)
# For even more simplicity, we rename (and translate) the variables names in the dataset
# without any missing, reducing the predictor categories to be binary,
# i.e. collapsing SNAP values "1" and "2" to "1":
library(plyr)
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 0 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 0 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
D2 <- D # For later use
# Save the new D to an .csv file without row names for further analysis
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1.csv",row.names=FALSE)
D <- D3
s <- dim(D)
n <- s[1]
p <- s[2]
txt = sprintf("Structure of the %d x %d DATASET", n, p)
print(txt)
[1] "Structure of the 2397 x 12 DATASET"
library(DiagrammeR)
n_txt = sprintf("Dataset \n (N = %d)", n);
gviz <- grViz("
# Circles: predictor variables; Triangle: Outcome variable
digraph Structure_of_the_dataset_D {
# node definitions with substituted label text
node [fontname = Helvetica]
1 [label = 'Dataset \n (N = 2397)', shape=box]
2 [label = 'gender \n {Girl (0) | Boy (1)}', shape=circle]
3 [label = 'grade \n {2 | 3 | 4}', shape=circle]
4 [label = 'ave \n (average marks) \n [1, 6] or {low (L) | medium (M) | high (H)}', shape=triangle]
a [label = 'SNAP \n {0 | 1 | 2}', shape=oval]
b [label = 'SNAP1', shape=circle]
c [label = 'SNAP2', shape=circle]
d [label = 'SNAP3', shape=circle]
e [label = 'SNAP4', shape=circle]
f [label = 'SNAP5', shape=circle]
g [label = 'SNAP6', shape=circle]
h [label = 'SNAP7', shape=circle]
i [label = 'SNAP8', shape=circle]
j [label = 'SNAP9', shape=circle]
# edge definitions with the node IDs
1 -> {2 3 a 4}
a -> {b c d e f g h i j}
}",
engine = "dot")
print(gviz)
NULL
# This does not work using DiagrammeR / GraphViz
# png("../manuscript/Figs/graph_design.png")
# print(gviz)
# dev.off()
# Uses Viewer, Zoom and Screen capture to produce .png and then
# data_prep_structure_grviz_20160203.pdf file
In our analysis we included n = 2397 individuals (none with missing data) from the dataset “/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav”.
D <- D3
n_txt = sprintf("In our analysis we included n = %d individuals (none with missing data) from the dataset '%s'\n", nrow(D), fn);
print(n_txt)
[1] "In our analysis we included n = 2397 individuals (none with missing data) from the dataset '/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav'\n"
We consider the grades (academic_achievement), as both a continuous (for regression) and discretized variable (for classification), where gjennomsnitt: - Item ‘Karaktergjennomsnitt alle gyldige karakterer 1-6 (ikke kroppsøving)’
hist(as.numeric(averBinned))
averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE)
summary(averBinned)
[1,3.75) [3.75,4.43) [4.43,5.9]
779 818 800
Make histogram of dicretized ‘averBinned’:
hist(as.numeric(averBinned))
Define grade categories “low”, “medium” and “high” in terms of the calculated cut-point intervals:
txt_low <- sprintf("low (L): [%.3f, %.3f)\n", cutpoints[[1]], cutpoints[[2]])
print(txt_low)
[1] "low (L): [1.000, 3.750)\n"
txt_medium <- sprintf("medium (M): [%.3f, %.3f)\n", cutpoints[[2]], cutpoints[[3]])
print(txt_medium)
[1] "medium (M): [3.750, 4.429)\n"
txt_high <- sprintf("high H): [%.3f, %.3f]\n", cutpoints[[3]], cutpoints[[4]])
print(txt_high)
[1] "high H): [4.429, 5.900]\n"
library(psych)
# Dataset for classification based on D3 and discretized average academic achievemnt
C <- D3
C$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("L","M","H"))
C <- subset(C, select = -c(ave))
str(C)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "L","M","H": 3 1 2 2 2 2 1 2 3 2 ...
headTail(as.data.frame(C))
headTail(as.data.frame(D3))
# Save the dataset C with binary SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_L_M_H.csv",row.names=FALSE)
# Dataset for classification based on D3 and discretized average academic achievemnt
E <- D3
E$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("0","1","2"))
E <- subset(E, select = -c(ave))
str(E)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "0","1","2": 3 1 2 2 2 2 1 2 3 2 ...
summary(E)
gender grade snap1 snap2 snap3 snap4
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069 Mean :1.062
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
snap5 snap6 snap7 snap8 snap9 averBinned
Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 0:779
1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1:818
Median :1.00 Median :1.000 Median :1.000 Median :1.000 Median :1.000 2:800
Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174 Mean :1.084
3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
headTail(as.data.frame(E))
headTail(as.data.frame(D3))
# Save the dataset E with numerical SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(E, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_0_1_2.csv",row.names=FALSE)
library(Hmisc)
describe(C)
C
12 Variables 2397 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique
2397 0 2
G (1141, 48%), B (1256, 52%)
----------------------------------------------------------------------------------------------
grade
n missing unique
2397 0 3
2nd (1008, 42%), 3rd (827, 35%), 4th (562, 23%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
2397 0 3
N (46, 2%), S (2079, 87%), C (272, 11%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
2397 0 3
N (50, 2%), S (2117, 88%), C (230, 10%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
2397 0 3
N (15, 1%), S (2201, 92%), C (181, 8%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
2397 0 3
N (16, 1%), S (2217, 92%), C (164, 7%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
2397 0 3
N (31, 1%), S (2190, 91%), C (176, 7%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
2397 0 3
N (32, 1%), S (2195, 92%), C (170, 7%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
2397 0 3
N (12, 1%), S (2312, 96%), C (73, 3%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
2397 0 3
N (93, 4%), S (1794, 75%), C (510, 21%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
2397 0 3
N (27, 1%), S (2142, 89%), C (228, 10%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique
2397 0 3
H (800, 33%), L (779, 32%), M (818, 34%)
----------------------------------------------------------------------------------------------
# Save the dataset C with SNAP predictors as factors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_N_S_C_outcome_is_L_M_H.csv",row.names=FALSE)
library(Hmisc)
describe(C)
C
12 Variables 2397 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct
2397 0 2
Value G B
Frequency 1141 1256
Proportion 0.476 0.524
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
2397 0 3
Value 2nd 3rd 4th
Frequency 1008 827 562
Proportion 0.421 0.345 0.234
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
2397 0 3
Value N S C
Frequency 46 2079 272
Proportion 0.019 0.867 0.113
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
2397 0 3
Value N S C
Frequency 50 2117 230
Proportion 0.021 0.883 0.096
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
2397 0 3
Value N S C
Frequency 15 2201 181
Proportion 0.006 0.918 0.076
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
2397 0 3
Value N S C
Frequency 16 2217 164
Proportion 0.007 0.925 0.068
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
2397 0 3
Value N S C
Frequency 31 2190 176
Proportion 0.013 0.914 0.073
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
2397 0 3
Value N S C
Frequency 32 2195 170
Proportion 0.013 0.916 0.071
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
2397 0 3
Value N S C
Frequency 12 2312 73
Proportion 0.005 0.965 0.030
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
2397 0 3
Value N S C
Frequency 93 1794 510
Proportion 0.039 0.748 0.213
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
2397 0 3
Value N S C
Frequency 27 2142 228
Proportion 0.011 0.894 0.095
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct
2397 0 3
Value H L M
Frequency 800 779 818
Proportion 0.334 0.325 0.341
-----------------------------------------------------------------------------------------------------------
library(pander)
panderOptions("digits", 5)
pander(summary(C))
-------------------------------------------------------------------------
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7
-------- -------- ------- ------- ------- ------- ------- ------- -------
G:1141 2nd:1008 N: 46 N: 50 N: 15 N: 16 N: 31 N: 32 N: 12
B:1256 3rd: 827 S:2079 S:2117 S:2201 S:2217 S:2190 S:2195 S:2312
NA 4th: 562 C: 272 C: 230 C: 181 C: 164 C: 176 C: 170 C: 73
-------------------------------------------------------------------------
Table: Table continues below
----------------------------
snap8 snap9 averBinned
------- ------- ------------
N: 93 N: 27 H:800
S:1794 S:2142 L:779
C: 510 C: 228 M:818
----------------------------
pander(summary(E))
---------------------------------------------------------------------
gender grade snap1 snap2 snap3
------------- ------------- ------------- ------------- -------------
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000
---------------------------------------------------------------------
Table: Table continues below
--------------------------------------------------------------------
snap4 snap5 snap6 snap7 snap8
------------- ------------ ------------- ------------- -------------
Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :1.00 Median :1.000 Median :1.000 Median :1.000
Mean :1.062 Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174
3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.000 Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000
--------------------------------------------------------------------
Table: Table continues below
--------------------------
snap9 averBinned
------------- ------------
Min. :0.000 0:779
1st Qu.:1.000 1:818
Median :1.000 2:800
Mean :1.084 NA
3rd Qu.:1.000 NA
Max. :2.000 NA
--------------------------
Describe subsets of data according to academic achievement and gender
C.girls.L <- C[ which(C$gender=='G' & C$averBinned=='L'), ]
C.girls.H <- C[ which(C$gender=='G' & C$averBinned=='H'), ]
C.boys.L <- C[ which(C$gender=='B' & C$averBinned=='L'), ]
C.boys.H <- C[ which(C$gender=='B' & C$averBinned=='H'), ]
summary(C.girls.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G:447 2nd:183 N: 23 N: 33 N: 9 N: 9 N: 17 N: 16 N: 5 N: 48 N: 14 H: 0
B: 0 3rd:163 S:328 S:313 S:361 S:365 S:353 S:351 S:406 S:226 S:355 L:447
4th:101 C: 96 C:101 C: 77 C: 73 C: 77 C: 80 C: 36 C:173 C: 78 M: 0
summary(C.girls.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G:305 2nd:118 N: 1 N: 3 N: 0 N: 0 N: 5 N: 2 N: 0 N: 7 N: 2 H:305
B: 0 3rd:105 S:282 S:286 S:284 S:289 S:284 S:289 S:300 S:245 S:281 L: 0
4th: 82 C: 22 C: 16 C: 21 C: 16 C: 16 C: 14 C: 5 C: 53 C: 22 M: 0
summary(C.boys.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G: 0 2nd:128 N: 6 N: 3 N: 1 N: 0 N: 3 N: 3 N: 3 N: 12 N: 2 H: 0
B:332 3rd:100 S:284 S:284 S:311 S:304 S:303 S:299 S:321 S:245 S:295 L:332
4th:104 C: 42 C: 45 C: 20 C: 28 C: 26 C: 30 C: 8 C: 75 C: 35 M: 0
summary(C.boys.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
G: 0 2nd:232 N: 2 N: 0 N: 0 N: 0 N: 0 N: 0 N: 0 N: 1 N: 1 H:495
B:495 3rd:146 S:474 S:488 S:486 S:491 S:489 S:493 S:493 S:450 S:476 L: 0
4th:117 C: 19 C: 7 C: 9 C: 4 C: 6 C: 2 C: 2 C: 44 C: 18 M: 0
library(Hmisc)
describe(C.girls.L)
C.girls.L
12 Variables 447 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
447 0 1 G
Value G
Frequency 447
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
447 0 3
Value 2nd 3rd 4th
Frequency 183 163 101
Proportion 0.409 0.365 0.226
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
447 0 3
Value N S C
Frequency 23 328 96
Proportion 0.051 0.734 0.215
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
447 0 3
Value N S C
Frequency 33 313 101
Proportion 0.074 0.700 0.226
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
447 0 3
Value N S C
Frequency 9 361 77
Proportion 0.020 0.808 0.172
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
447 0 3
Value N S C
Frequency 9 365 73
Proportion 0.020 0.817 0.163
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
447 0 3
Value N S C
Frequency 17 353 77
Proportion 0.038 0.790 0.172
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
447 0 3
Value N S C
Frequency 16 351 80
Proportion 0.036 0.785 0.179
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
447 0 3
Value N S C
Frequency 5 406 36
Proportion 0.011 0.908 0.081
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
447 0 3
Value N S C
Frequency 48 226 173
Proportion 0.107 0.506 0.387
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
447 0 3
Value N S C
Frequency 14 355 78
Proportion 0.031 0.794 0.174
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
447 0 1 L
Value L
Frequency 447
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.girls.H)
C.girls.H
12 Variables 305 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
305 0 1 G
Value G
Frequency 305
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
305 0 3
Value 2nd 3rd 4th
Frequency 118 105 82
Proportion 0.387 0.344 0.269
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
305 0 3
Value N S C
Frequency 1 282 22
Proportion 0.003 0.925 0.072
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
305 0 3
Value N S C
Frequency 3 286 16
Proportion 0.010 0.938 0.052
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
305 0 2
Value S C
Frequency 284 21
Proportion 0.931 0.069
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
305 0 2
Value S C
Frequency 289 16
Proportion 0.948 0.052
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
305 0 3
Value N S C
Frequency 5 284 16
Proportion 0.016 0.931 0.052
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
305 0 3
Value N S C
Frequency 2 289 14
Proportion 0.007 0.948 0.046
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
305 0 2
Value S C
Frequency 300 5
Proportion 0.984 0.016
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
305 0 3
Value N S C
Frequency 7 245 53
Proportion 0.023 0.803 0.174
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
305 0 3
Value N S C
Frequency 2 281 22
Proportion 0.007 0.921 0.072
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
305 0 1 H
Value H
Frequency 305
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.boys.L)
C.boys.L
12 Variables 332 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
332 0 1 B
Value B
Frequency 332
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
332 0 3
Value 2nd 3rd 4th
Frequency 128 100 104
Proportion 0.386 0.301 0.313
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
332 0 3
Value N S C
Frequency 6 284 42
Proportion 0.018 0.855 0.127
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
332 0 3
Value N S C
Frequency 3 284 45
Proportion 0.009 0.855 0.136
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
332 0 3
Value N S C
Frequency 1 311 20
Proportion 0.003 0.937 0.060
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
332 0 2
Value S C
Frequency 304 28
Proportion 0.916 0.084
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
332 0 3
Value N S C
Frequency 3 303 26
Proportion 0.009 0.913 0.078
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
332 0 3
Value N S C
Frequency 3 299 30
Proportion 0.009 0.901 0.090
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
332 0 3
Value N S C
Frequency 3 321 8
Proportion 0.009 0.967 0.024
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
332 0 3
Value N S C
Frequency 12 245 75
Proportion 0.036 0.738 0.226
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
332 0 3
Value N S C
Frequency 2 295 35
Proportion 0.006 0.889 0.105
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
332 0 1 L
Value L
Frequency 332
Proportion 1
-----------------------------------------------------------------------------------------------------------
describe(C.boys.H)
C.boys.H
12 Variables 495 Observations
-----------------------------------------------------------------------------------------------------------
gender
n missing distinct value
495 0 1 B
Value B
Frequency 495
Proportion 1
-----------------------------------------------------------------------------------------------------------
grade
n missing distinct
495 0 3
Value 2nd 3rd 4th
Frequency 232 146 117
Proportion 0.469 0.295 0.236
-----------------------------------------------------------------------------------------------------------
snap1
n missing distinct
495 0 3
Value N S C
Frequency 2 474 19
Proportion 0.004 0.958 0.038
-----------------------------------------------------------------------------------------------------------
snap2
n missing distinct
495 0 2
Value S C
Frequency 488 7
Proportion 0.986 0.014
-----------------------------------------------------------------------------------------------------------
snap3
n missing distinct
495 0 2
Value S C
Frequency 486 9
Proportion 0.982 0.018
-----------------------------------------------------------------------------------------------------------
snap4
n missing distinct
495 0 2
Value S C
Frequency 491 4
Proportion 0.992 0.008
-----------------------------------------------------------------------------------------------------------
snap5
n missing distinct
495 0 2
Value S C
Frequency 489 6
Proportion 0.988 0.012
-----------------------------------------------------------------------------------------------------------
snap6
n missing distinct
495 0 2
Value S C
Frequency 493 2
Proportion 0.996 0.004
-----------------------------------------------------------------------------------------------------------
snap7
n missing distinct
495 0 2
Value S C
Frequency 493 2
Proportion 0.996 0.004
-----------------------------------------------------------------------------------------------------------
snap8
n missing distinct
495 0 3
Value N S C
Frequency 1 450 44
Proportion 0.002 0.909 0.089
-----------------------------------------------------------------------------------------------------------
snap9
n missing distinct
495 0 3
Value N S C
Frequency 1 476 18
Proportion 0.002 0.962 0.036
-----------------------------------------------------------------------------------------------------------
averBinned
n missing distinct value
495 0 1 H
Value H
Frequency 495
Proportion 1
-----------------------------------------------------------------------------------------------------------